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ABSTRACT 

We investigate the formation of /3-sheet structures in proteins without 
sequence-dependent side-chain interactions. To accomplish this, we introduce 
a model which explicitly incorporates both solvation effects and the angular 
dependence (on the protein backbone) of hydrogen bond formation. The ther- 
modynamics of this model is studied by exploring the density of states for the 
entire system and the local couplings in a partially folded structure. Our results 
suggest that solvation dynamics together with the H-bond angular dependence 
gives rise to a generic cooperativity in this class of systems; this result ex- 
plains why pathological aggregates involving /3-sheet cores can form from many 
different proteins. Our work provides the foundation for the construction of 
phenomenological models to investigate topology effects in /3-sheet folding and 
the competition between native folding and non-specific aggregation. 

I. INTRODUCTION 

/3-sheets are an important element of protein structure, occurring often both in functional 
units and in pathological aggregates. For example, signaling proteins such as SH3 contain a 
(3 core. On the other hand, the precursors of amyloidogenesis are commonly found to have 
a /3-sheet organization!; moreover, a conformational rearrangement can switch a functional 
/3-sheet into an aggregation unit!'!. As they play the role of stabilizing highly organized 
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architectures (either pathological or functional), /3-sheets must be strongly cooperative units, 
responsible for lowering the free energy of these folded structures as compared to entropy- 
dominated more irregular patterns. 

It is often assumed that hydrophobic interactions are the major forces contributing to 
/5-sheet cooperativity and their interacting pattern (i.e., the sequence design) can define a 
unique fold!. But, one cannot ignore the fact that many neurodegenerative diseases are 
caused by the aggregation of mutant proteins with long hydrophilic sequences (e.g., poly- 
glutamine in Huntington diseasei) ; the /3-sheet at the core of these aggregates are certainly 
not being stabilized by hydrophobic forces. Indeed, recent experiments have shown that 
peptides with purely hydrophilic sequences can fold /3-sheets cooperatively!. Also, growing 
evidence suggests that proteins are capable of forming /3-sheet aggregates, regardless of their 
native folds!. Together with the fact that most amyloidogenetic precursors do not share 
any homologous sequence!, it is therefore very likely that throughout the 2-dimensional /3- 
sheet architecture, formation of an H-bond network can generate long-range ordering, hence 
compensating the mismatch of side-chain interactions and stabilizing the system. 

What are the basic differences between backbone H-bond and side-chain hydrophobic 
interactions? Besides the difference in sequence specificity discussed above, they also differ 
in the maximal number of contacts for each interaction unit. Along the peptide chain, 
one amino acid can form at most two backbone H-bonds with adjacent /9-strands in the 
2-dimensional /3-sheet structure. In contrast, for each hydrophobic residue in a compact 
protein, the number of favorable (sequence specific) contacts can be more than two; this leads 
to many-body effects that can define and stabilize a unique foidil!. In most protein folding 
literature, the hydrophobic many-body effect is mimicked by the use of Go-type interaction^ 
and sequence-specific restrictions on the backbone energy0 to favor the native state. This 
approach, however, is not generally applicable to the case of sequence-independent /3-sheet 
formation. 

Moreover, from a statistical mechanics prospective, it is well known that having coop- 
erative behavior in a 2-dimensional, pairwise-interacting system requires more than two 
couplings (in average) for each interaction unitH In protein folding, this corresponds to the 
case that folding of a specific part of the overall structure will aid its contiguous contact 
formation without redundant entropy loss, as briefly discussed by Berriz et a£l. At the first 
glance, this principle does not apply to the case of H-bond network formation in an obvious 



3 



way. Thus, elucidating how the networking of individual H-bond can give rise to a stable 
folded structure is the major goal of present work. 

For this purpose, we study a model system without specific side-chain interaction. It 
has already been shown that for /5-structured proteins with randomly assigned sequences, 
a pure Lennard- Jones potential might not be sufficient to provide systemic cooperativity,0 
and angular dependence in contact formation must be taken into account^!. Also, there is 
growing evidence that solvent effects play a significant role in protein folding and conforma- 
tional changeslrHl! Thus, we introduce a new approach to integrate these two effects 
in modeling H-bond formation. The ingredients of our approach are a "co-plane" parame- 
terization of the backbone degrees of freedom with explicit couplings of solvation dynamics 
and the angular dependence of hydrogen-bond formation. The systemic cooperativity is 
investigated by exploring the density of states for /3-sheets of different size and structure. 
To facilitate the sampling, we have used replica-exchange@ and multi-canonical rescalingill 
techniques. The origin of this global cooperativity is further studied by comparing energetics 
and thermodynamic stability of local sub-structures of model systems with different types 
of interaction potentials. 

Our results indicate that it is the de-solvation cavity formation (i.e., the granularity 
of the water molecule) and the hydrogen-bond angular dependence that provide /3-sheets 
with a systemic cooperativity. Since this property is sequence-independent, our results 
suggest why /3-sheets can generically serve as the building blocks for constructing functional 
macromolecules, and why most functional proteins have the tendency to form aggregates at 
sufficiently high concentration. 

II. METHOD 



A. The model system 

The system proposed here is based on the "ball-and-stick" model introduced in Ref. 122 



In this approach, all NH, C a , CO (represented as C), and Cp side-chains are treated as 
lumped balls. The local backbone energy consists of a sum of terms: 

1. bond bending: V e = \ Yl k$(0i — do) 2 
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2. dihedral rotations: V w = e^l+cosu^), V^, = -J] e«/,(l+cos 30j), V^, = |J]e^(l + 
cos 3ipi), 

3. side-chain chirality: V chimZ = J2 e chirai® ({[^vc Q x r C/J cJ • ^C'CajJ 

with the sum taken over all components (labeled by i), and O(s) = (s + |s|)/2. Here, since 
we are interested in sequence-independent effects, all side-chains are treated as polar or 
weakly charged groups that favor /5-sheet formation, with interaction strengthsB@ that are 
far less than the magnitude of the H-bond energy and can be generally ignored. Also, we 
take the parameters kg = 200kcal/mol • rad 2 , e w = 40kcal/mol, = = 0.45kcal/mol, and 
tchirai = 10kcal/mol from Ref. 



B. The solvation force field 



Next, we construct expressions for the non-local interactions. We incorporate the sol- 
vation effect into the force law by designing a double-well potential that can account for 
contact formation and single-H 2 hydration (between the contact pair); the barrier between 
the wells corresponds to the free energy cost involved in the transfer of a water molecule out 
of the hydration shellffl. Multiple shells could be accommodated via a multiple-well po- 
tential, but this is not attempted here. The approximation of using an effective free-energy 
is justified via the observation that transfer of water from within the vicinity of the contact 
pair is faster than the contact formationlllHl. 

There is no standard way to construct a continuous de/solvation potential^ which can 
be used for Langevin simulations; we note that a previous curve fitting efforti! and a recent 
Monte Carlo studyEI utilized a discrete version of such a potential. We therefore propose an 
empirical profile based on estimated experimental parameters. Specifically, we will take for 
H-bond formation 
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Here r is the distance between NH and CO, and the integer k G N (k = 6, e.g.) is chosen 
to give specific long-range behavior. The values r = r^ b , r = rjj b , and r = represent the 
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separation distance at the contact bond position, at the the peak of the desolvation barrier, 
and in the presence of a single intervening solvent molecule separation, respectively, 
is obtained by adding the known value of r^ b to the size of a single H2O. A and r^ b can 
be determined by the strengths of the H-bondS and the desolvation barrier; the latter is 
approximated by the surface energy involved in forming a desolvation cavity, estimated to 
equal ph 0.103kcal/(mol A 2 Here we have r^ b ta 3.43 A and r^ b ta 3.92 A; these yield a 
single H 2 hydration energy of ~ —0.5 kcal/mol. 

Clearly, the design principle behind this type of formula is to have three roots for 
dXJm>(r) / dr = accounting for the desolvation barrier and for the two local minima; one can 
add more roots to address multiple solvation shell effects. Note that one might wish to fine- 
tune the potential profile more precisely (for instance, by being able to independently vary 
the width of the contact well). This can be accomplished by a more complicated expression 
(see appendix [A]). Also, the strength of de/solvation might depend on the hydrophobicity of 
local environment (i.e., de-wetting behavior) and one can certainly take this into account. 

As discussed i hydrogen bond formation has a strong angular dependence on its 

surrounding backbone; thus, one can not fully describe the interaction using the radial 
distance alone. From Fig. |l](d), we note that such an angular dependence is merely the 
requirement that all atoms near to the interacting NH, CO (i.e. the shadowed area) are 
aligned "natively". In Ref. 1221, this "alignment" effect is accomplished by introducing 
artificial repulsive forces between H-bond neighbors. Here, we propose instead a "functional- 
block" (co-plane) scheme. 



C. The co-plane approach 

From Fig. |l|(a), we notice that the peptide backbone between two contiguous C a atoms 
has a co-planar structure because of the N-C-0 bond resonance. This allows us to model the 
motion of a "co-plane" as a whole (i.e., with one dihedral angle u fixed to ~ 180°). Overall, 
each interacting unit in our system is just one block of C Q co-plane. The 2 th co-plane is 
defined by three points {X.[ , Xg , Xjj ; } where Xj , X3 are simply the two C a in the plane 
and Xn is a virtual point satisfying xfxf JL xfbcf , fig|](b). O nee the degree of freedom 
corresponding to this point is specified, all the atomic locations of the amide and carbonyl 
groups are fixed. Note that a convenient way to define the arbitrary point X2 is such that 
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the vector X^X^ points directly at the C a of the neighboring strand if the system is in 
the native /3-sheet structure; see fig.|l](c). Once Xjj is chosen, this determines the in-plane 
angles 9i and 9 2 as well as the virtual plane projection angle 9 3 . We computed all these 
objects by using the native value of bond anglesS and dihedral angles observed in regular 
/^-sheets (<f) ~ —138 ± 1°, ip ~ 135 ± l°)il. This leads to all geometrical properties of the 
virtual plane, including #i = 64.7°, 9 2 = 62.2°, 9 3 = 22.4°, XTX^ = 3.75 A, X^ = 0.57 A. 

Given the native plane, we define the range of possible non-native structures via allowing 
the orientation of the planes to vary. This yields three degrees of freedom per residue; two 
reflect the angles needed to define the direction of the fixed length vector going from one C a 
to the next while the third refers to a rolling of the plane around this vector. One can then 
work out the geometrical problem of expressing, in terms of these degrees of freedom, the 
residual terms in the backbone energy arising from the dihedral terms involving and ip, the 
bending of the bond angle ZC'C a N, and the side-chain chiralities between consecutive blocks. 
Our approach fully maintains the overall translation and rotational degrees of freedom of 
the protein molecule. This co-plane parameterization greatly reduces our computation effort 
as compared to all atom backbones, yet maintains the roll degree of freedom not present in 
the simplified C a model. As will be shown later, this degree of freedom is important for the 
study of /9-sheet cooperativity. 



D. The structural factor 



Now we use the co-plane approach to model the H-bond angular dependence. From 



fig. 0(e), we note that having a H-bond between blocks i, j requires fixing two rotational and 
two translational degrees of freedom. More precisely, we need xJpX^ antiparallel to X^X 2 \ 



¥^1 



and X 2 X 3 parallel to X 2 f Xjj ■ , whereas for translational alignment, the two sets of points 
{X^X^Xj^} and {X^, X^, X^} must be collinear. This leads us to define a structural 
factor that monitors the angular nativeness in the H-bond 
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where X^ is the unit vector connecting X.^ , xj^ . Note that x^ 1 has maximum of 1 only 
when all the alignment criteria are satisfied. 



(i)vtiM 1 ) 



1 + cos ZX^'X^X 
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We now wish to incorporate this structural factor into the Hamiltonian in such a manner 
as to ensure that at small x^, the two interacting blocks are unlikely to form H-bond. A 
simple phenomenological way to proceed is to introduce a potential profile which has two 
parameters dynamically modulated by xnb with a fixed solvation energy E so i 



+ 



Hb 



^(^Hb) 



+ 



1 



1 



1 



r 



sol\k 
Kb) 



Hb 



+ 



Kb J 



sol\k 



r 



Kb) 



1 



1 

2r^ 



3r 2k 
1 



(tyiX ty* jy-i S Ol \ Al 

'Hb'Hb'HbJ 



E 



sol 



(3) 
and 



Here the first dynamical parameter A(xHb) is determined by fixing Um>(fm» x Hb 
for another parameter, we have used a linear relation r^ b = xnb^Hb + (l — x nb)r^ h . Examples 
of this modulated double- well profile are shown in fig.g(a). Note that even incorporated with 
this structural factor, the H-bond potential still remains pairwise between two interacting 
blocks. 

Aside from the NH- ■ • OC hydrogen bond, we include in our model the bonding between 
C a -H- ■ ■ 0=C§. This interaction has a strong chiralityi! and an angular dependence that 
involves at least two contiguous blocks. From figfj(d), we note that the native configuration 
for interacting, consecutive blocks i, i + 1 and j,j + 1 (here + 1 is the block next to 
along the same strand) requires X23 (Xfg) antiparallel to X^" 1 ^ (X^ 1 " 1 ). Therefore, in 
analogy to the aforementioned H-bond ,we propose a structural factor for this additional 
interaction 
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(4) 



with an interaction potential Uc Q H-oc( r , ^chiral) taken to be similar to Unb(^, #Hb)- We use 
a C a -H- • • 0=C interaction strength half that of a single H-bondS 



E. The role of the force field 



To characterize the roles of structural factors and solvation effects in /5-sheet formation, 
we have studied four different forms of the H-bond potential. These are: 



(A) LJ'«: U£?( 
dence. 



\E 



Hb 



a L-J potential without angular depen- 
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(B) Sol^": Ug^j(r) =>• eqn.(|I]), a double- well solvation potential without angular 
dependence. 



- 2 



, a 2-dynamical-parameter L- 



(C) LJ^: ur/(r) = A LJ (x Hb )^^ [Jaa^SL 
J potential with our proposed angular dependence. This profile is designed to help 
distinguish the effects of solvation and structural factors. Thus, except for the ab- 
sence of a desolvation barrier, this potential is quite similar to the solvation one, 
eqn.(£|). Explicitly, we let a non-native alignment factor xnb < 1 gi ye r i se to a 
linear shift |U£j ff (r L j(:ZHb))| = A LJ (x R b) = x n b\Enb\ + (1 - x n b)\E sol \ for the first 
parameter. Also, to achieve a repulsive effect similar to that in eqn.(^), we set 
U Lj 9 ( r Hb) = Unb^Hb^Hb), i.e., r LJ (a;Hb) = rg b (l + a/1 + U H b(»H b , x Hh )/A L3 (x Hh )) 1/6 
for the second parameter. Fig.||(b) shows the modulation of this force field due to 
varying of structural factor. 

(D) Sol an9 : Ug™^(r) =>- eqn.(|3[), the de/solvation potential with the proposed angular 
dependence, which constitutes our full model. 



F. The replica-exchange and Multi-canonical Technique 



Throughout the entire paper, the simulation procedure employed a modified Verlet- 
Langevin algorithm. Specifically, for a particular set of coordinate x(t) with velocity v(t) 
and force f(t), the update at time t, Ax(t) = x{t + At) — x(t) is obtained by 
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with m, (, E = H({x}), rj as mass, viscosity, energy (Hamiltonian), and uncorrelated thermal 
noise, respectively, where {x} represents the set of coordinates for entire system. Since the 
coordinates used in our simulations represent Cq, co-planes, the mass m and the radius for 
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viscosity ( are approximated from the mass of a single glutamine and the C a bond length, 
respectively. 

The replica-exchange method!! and multi-canonical rescaling technique!! are used to 
obtain the density of states n(E) and hence thermodynamic properties of any given system. 
First, several simulations are performed at different temperatures. To enhance the sampling, 
configurations obtained at different temperatures T, T' were switched in between based on 
the Metropolis criterion^! that obeys detail balance, 



{x} T , {x'}t> -> {x'} T , {x}t 



if A < 0, 
if A > 0. 



(6) 



with A = [1/T— 1/T'] [E({x'})— E({x})} (see Ref. P0| and references therein for more details). 
Second, an initial guess of n(E) is obtained by using a WHAM (weighted histogram analysis 
method)-like procedure0'0, 

n{E)e- f5E 
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n(E) oc 



J2 E n(E)e-^ 
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-PE 



Pp{E) 



(7) 



where (3 = 1/ksT (ks is Boltzmann factor) and Pp{E) is the probability of energy distribu- 
tion accumulated from simulations at temperature T = 1/ksP- 

Then, we performed multi-canonical rescaling simulations to refine n(E), in which we 
used the same Hamiltonian but the forces in equation of motion (pf) were rescaled as 



VE 



8E'(E,T) 
dE 



VE with E'(E,T) as a trial function. This rescaling will yield a proba- 



bility distribution Pp{E) oc n( J E)e- /3S ' (B ' T) ; thus, if we choose E'(E,T) ~ k B Tlnn(E), 
Pp{E) will become relatively flat and hence the sampling become more accurate (seeHil and 
reference therein for detailed discussion). Finally, the refined n(E) is obtained iteratively, 



n(E)i oc 



Pp{E)i 

where i indexes the iterations. 
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G. The local melting approach 



Aside from examining the global cooperativity, we have also explored the thermody- 
namics of local binding events in the model system. The idea is that in the absence of 
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hydrophobic clustering, the system cooperativity (if any) must arise from a scaling up of 
small scale bound structures. In other words, folding of a specific part of the overall struc- 
ture will be aided by any contiguous folded regionsil; this can be investigated by exploring 
the conditional probabilities of residues of being folded (i.e., in their native position and 
orientation) or unfolded depending on whether their neighbors are folded or unfolded (i.e., 
the contact correlations). Specifically, we investigated two cases; the ensemble that has all 
blocks retained in their folded configurations aside from either a) a few blocks at the end of 
one /3-strand or b) a few blocks buried inside the sheet. 

To study these local sub-structures, we have assumed here that their thermodynamics 
will not be affected significantly by the dynamics of very distant blocks. This allows us to 
compute a "conditional" partition function in which, except for those specified blocks, the 
rest can be treated as frozen in their native positions. Doing this, the number of degrees 
of freedom is significantly reduced and the conditional partition function can be obtained 
by explicit numerical integration. For instance, in the single-block /3-tail case, the partition 
function reads / dX 2 dX 3 8(X^-X^ N )5(X^-%jQ N ) e - H / kBT where XTX^ = 3.75 A, 

N 

X2X3 = 0.57 A are native constants, and X2, X3 are the virtual points of the movable 
terminal block (note that its Xi is fixed). Then, because of the 5 functions, the partition 
function is reduced to an integral over three angles defining the orientation of the terminal 
plane. 

III. RESULTS 

A. Global Cooperativity 

The first system we studied is a 3-strand /3-sheet with a total of 6 H-bonds equally 
distributed between adjacent strands. Since there is no sequence specificity, we artificially 
tethered the H-bonds at the 2 expected turns to maintain the /3-sheet propensity. In Fig. |3|, 
we show the computed specific heat for the four different model forces. Basically, we found 
that there is no sharp transition for the three models LJ^ iX , LJ ang , Sol^ if non-specific 
H-bonds are allowed to develop on the peptide backbone; this is because their dominant low- 
energy states are ensembles with non-specific, randomly collapsed structures rather than a 
uniquely defined /9-sheet (similar to the results observed by Guo & BrooksB) . The complete 
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model Sol an9 , however, can define a unique native state and give rise to a sharp transition, 
as seen in Fig. 0(a). 

Next, we added a Go-like restriction to the models LJ^ X , LJ an9 , Sol , i.e., allowing 
H-bond formation only between those residues that have such an interaction in the native 
/3-sheet structure^. Of course, the entire justification of this approach is absent in systems 
with homogeneous interaction as the one of primary concern here, but it is still worthwhile 
comparing the results of our proposed complete model with these alternatives. Note how- 
ever that our Go-like restriction does not include any additional restrictions on backbone 
configurations to favor a particular structure (i.e., native state); this is very different from 
the energetic restrictions commonly used in the off-lattice protein folding literature^. In 
Fig. 0(b), we show that even with Go-like restrictions to reduce the non-specific collapsed 
ensemble, the LJ^ X model can not give rise to a sharp transition compared with models 
incorporating either the de/solvation effect or angular dependency. This is consistent with 
results from other groupEl and also the aforementioned reasoning that with only two contact 
couplings for each interacting unit, a simple force field for H-bond formation can not produce 
systemic cooperativity. 

The difference between the various force models becomes much clearer in larger systems. 
Fig. |3](c) shows the specific heat diagrams for a 4-strand /3-sheet with a total of 15 H-bonds 
equally distributed between adjacent strands. Now, without a significant increase in the 
transition temperature, the peak magnitude of the specific heat for model Sol a ™ 9 is 100 fold 
larger than that of the previous small system, whereas there is still no significant transition 
for the L-J Go-like potential. This difference is further realized by examining the density of 
states. In Fig. §](a), we note that there is only one dominant ensemble for model L^ lx in 
the low-energy region; this ensemble mixes the native and partially folded states. In model 
LJ an9 , on the other hand, the unfolded ensemble (E ~ 0) is slightly separated from the 
folded one; however, the folded phase has a larger entropy than that of the unfolded state, 
and thus there is no transition. In the solvation model, as shown in Fig. |](b), there is a 
clear separation between the folded and unfolded ensemble, with the unfolded states having 
the largest entropy, as is crucial for having a sharp transition. 

Apparently, both the angular dependence and solvation effect slightly enhance the system 
cooperativity, but only a combination of both can yield the desired results in the system of 
primary concern here. This point has not been adequately addressed in previous workslii'0. 
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Also, we notice a high energy population appearing in model LJ an9 , Sor ia: , and SoP ns . Ana- 
lyzing the contact profile in this additional phase, we found that its dominant configurations 
are those with significant intervening of folding and unfolding in the entire structure, i.e., 
"droplet" -like structures@S. As will be elucidated in next section, this intervening can lead 
to an interfacial energetic penalty resulting from backbone twisting (enhanced by the an- 
gular dependence) and (re)-desolvation barrier crossing at the interface between contiguous 
folded and unfolded regions; thus the partially unfolded ensemble is shifted from low energy 
(as in model LJ^ X ) to the high energy region. Consequently, the completely folded and 
unfolded ensemble are well separated, leading to a sharp transition. 

Of course, whether these partially folded droplet configurations can be shifted to high 
energy depends not only on the choice of the H-bond potential (to create the interfacial 
penalty), but also the topology of the /3-sheet. When the H-bond energy inside the folded 
droplet can compensate the interfacial penalty, the structure will no longer stay at high- 
energy phase. To show this explicitly, we studied a system that is less symmetric on the 
/5-sheet plane; we chose a 3-strand /3-sheet with long strands, looking at the effect of in- 
creasing the strand length. We started with a 3-strand /3-sheet with a total of 30 H-bonds 
equally distributed between adjacent strands. Fig. |5|(a,b) shows its n(E) and specific heat, 
which are very similar to those of the 4-strand, 15 H-bond system, in that they both have 
well-separated unfolded/native states, and partially unfolded high-energy populations. An- 
alyzing the states in the low and high energy partially folded ensembles, we found that their 
dominant configurations are droplets with one (in the low energy phase) or multiple (in the 
high energy phase) interfaces between contiguous un/folded regions, i.e., a partially folded 
/3-sheet (or hairpin) buried in unfolded coils (illustrated by examples in Fig. |5](a)). 

When the system size increases to a total of 40 H-bonds, however, (see Fig.|6](a,b)), we find 
an increased weight for the partially folded ensemble in the low energy phase ranging from 
> E > En where En is the native state energy. Unlike the smaller, or symmetric system, 
here the interfacial energy penalty between contiguous un/folded regions, can not compete 
with the large H-bond energy at the bulky folded portion; this allows the partially folded 
ensemble to have a low energy. Moreover, in a system with such an elongated asymmetry, the 
partially folded portion can freely move along the /3-sheet, or freely slide along the chain with 
many "mis-pairing" H-bonds (i.e., inconsistent with native pairing). Both effects can then 
increase the entropy of the partially folded (or even misfolded) ensemble and hence smears 
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out the separation between completely un/folded ensemble and partially folded ensemble, 
Fig. |^(a). The smearing becomes more significant in a larger 3-strand /9-sheet with a 
total of 50 H-bonds, Fig. §](c). As a consequence, the system becomes less cooperative 
and an additional continuous transition occurs between these partially un/folded ensemble, 
fig.^(b,c). Developing a phenomenological model based on these results to study how the 
system cooperativity depends on the /3-sheet topology, therefore, is one of our future goals. 

B. Local Bonding Effects 

In this section, we examine how the model system acquires "global" cooperativity (in 
the absence of hydrophobic clustering) by exploring the thermodynamic effects of local 
un/bonding. Here, since we are dealing with only a small number of degrees of freedom, we 
could calculate all thermodynamic properties by explicit integration of the partition function. 
As an illustration, we first present the results of the single-block /5-tail case, Fig. [7|, where 
we plotted the probability distribution function (pdf) P(r) of the interacting distance r in 
the native NH- • • OC pair between the terminal block (the one allowed to fluctuate) and its 
native partner (taken to be frozen in space). 

In this case, the maximum possible NH- • ■ OC distance as determined by the range of 
integration, is smaller than that necessary to fit a water molecule in between and thus there 
cannot be any true secondary solvated minimum. We do see however the fact that having 
the solvation barrier and (to a lesser extent) the angular factors make a big difference in 
the structure of P(r). Specifically, there is no hint of well-separated 2-state behavior for 
model ~LJ* lx . However, a small barrier appears if the potential has an angular dependence 
(model LJ an9 ), and becomes significant if solvation effects are included (mo del Sol /ix , Sol an9 ). 
Here, we note that because of the bond angle ZC'CqN potential, P(r) is concentrated on 
the "ring" that satisfies ZC'C Q N = ZCC^N^,^. On this ring, only a few "points" will 
give minimal dihedral angle (ip, <ft) energies. These points correspond to the sharp peaks in 
fig.0 that have free energy difference ~ + ~ lkcal/mol from their surrounding; if we 
reduce the bond angle stiffness kg, these sharp peaks become broadened (data not shown). 
Apparently, this effect comes from the backbone energy and is independent of the choice of 
H-bond potential. 

Next, we studied the "buried" cases with n blocks allowed to fluctuate. In this situation, 
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the configurations of fluctuating blocks are clearly constrained to a great extent by their 
fixed neighbors. Thus, we found that for n — 1, there is neither 2-state behavior nor any 
qualitative difference among the results of different force fields (data not shown); all signifi- 
cant probability is confined in the H-bond well, indicating a strong topological confinement 
to force the folding of a single unfolded block buried amidst folded ones. When n = 2, we still 
do not have enough freedom of backbone motion to allow for a solvated second minimum for 

o 

the main H-bond; the maximum separation is slightly above 4.1 A, Fig. |8|(a). Nonetheless, 
we see clear differences among different models. For the simplest LJf lx model, there is no 
confinement to a narrow contact well. Adding the angular factor back in, we create a barrier 
as the NH- • • OC distance is increased. This barrier is due to the modulation of the LJ po- 
tential by the angular factor, making it repulsive if the orientation is not properly matched. 
Alternatively, adding the solvation effect as in model Sol^ lx also introduces a barrier; this is 
because rolling of the co-plane allows for a wide separation (and hence going up and down 
the solvation barrier) in the C a H- • • OC interactions. Finally, putting both effects in leads 
to a very large degree of confinement in the H-bond well, suggesting that local coupling is 
enhanced by both structural and solvation effects such that an unfolded block is strongly 
biased to fold if contiguous to folded ones. 

To further characterize the difference between the varying choices of force models, we 
explored the un/folding correlation in the 2-block buried case. Specifically, we computed 
the conditional probability function P(r) for one block with respect to the un/folding state 
of another (defined as folded/unfolded if its native H-bond separation r 2 ^ ?Hb)> P( r i\ r 2 > 
r Hb)' P{ r i\ r 2 < r Hb)' where ri, r 2 are their native H-bond separations, respectively. In other 
words, if the system has a good local coupling, un/folding of one block should force the 
un/folding of another, implying a well-separated P(ri\r 2 ^ r Hb)- m Fig. H(b), however, 
we found that P(ri\r 2 ^ r# b ) are well separated only in model Sol an9 . Here, the separated 
probability distributions have been individually normalized; the actual probability for being 
at the larger r is extremely small as is evident from Fig. |8](a). Also, note that it is not r 
in the figure which is important - this H-bond separation is quite small. Instead, the two 
peaks correspond to either having the full H-bond energy (by satisfying both the distance and 
angular conditions) or not having that energy; the distribution at larger r does not have the 
correct angular configuration. To clarify this point, we plotted the conditional probability 
function on the variation of H-bond energy E instead of distance r, i.e., P{Ei{r\)\r 2 > 
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r^ h ) , P(Ei(r 1 )\r 2 < r^ h ), Fig. |](b). Now, we see clearly that only the full model Sol ans has 
a significant separation effect. 

C. Collective Re-solvation Kinetics 

The conclusion from the local coupling studies is that the full model being proposed here 
leads to strong correlations between the behaviors of neighboring blocks. For the system as 
a whole, this should mean that complicated configurations with repeated interfaces between 
unfolded and folded residues would cause additional energetic penalties and thus can be 
strongly suppressed in favor of simpler configurations with large patches of residues being 
either all in contact or all not in contact. Of course, whether the system can therefore 
acquire an global cooperativity would depend on its topology as well, as revealed in earlier 
section. 

Now, the interesting part of this conclusion, as will be illustrated later, is that during 
unfolding (folding), the solvation (desolvation) process must occur in a collective way. To 
see this explicitly, we show the pdf and re-solvation kinetics of a 10-block buried case in Fig. 
H(a,b). Because of the large barrier needed to melt the blocks, here we set the simulation 
temperature far beyond the physiological range. 

From Fig. |9](a) (using Sol ara9 ), we note that the pdf of native H-bond distances are well 
separated by a free energy (-/c^T In P(r)) barrier, resulting in an ordered (H-bond formed) 
and disordered (H-bond not formed) phases. Interestingly, even governed by the same force 
law, the un/folding barriers for the various molten blocks are not quite the same. For blocks 
close to the confined ones (those frozen in the space), the barrier progressively increases, and 
the disordered phase becomes less significant. This reduction of disordered phase manifests 
a stronger folding bias for any unfolded residues near those folded ones. Thus it is not 
effective to unbind the blocks one by one, since the unbound block will rebind long before 
its neighbor has unbound. This can be seen in the high-temperature unfolding simulation, 
Fig. |S](b), in which we note that the resolvation (i.e., infiltration of water molecules) process 
involved many residues simultaneously. In Fig. f|(c), we show that this collective water 
infiltration indeed occurs in unfolding of the entire /3-sheet. 
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IV. DISCUSSION 

We have explored the mechanism of /3-sheet cooperativity without side-chain interactions, 
with different choices of the H-bond potential. Our results suggest that in the absence of 
hydrophobic clustering, the /3-sheet cooperativity arises from a coupling of structural factor 
(contact angular dependence), protein de/solvation effects, and the system topology. Since 
this phenomenon is sequence-independent, the results can be considered to apply in general 
to /3-sheet structures. 

Given the coupling number for each H-bond interacting unit, our results differ from most 
2-dimensional pairwise-interacting system, in that there the minimal number of coupling 
required for global cooperativity is larger than the one present here. Thus, the coupling 
between backbone energy (the angular dependence) and pairwise H-bond interaction (in- 
corporated with solvation dynamics) must yield a strong many-body effect, which has been 
argued to be crucial to produce systemic cooperativity in homogeneous system such as are 
the primary concern hereil. Apparently, our model system has a default many-body effect, 
the dihedral confinement via the corresponding terms in the backbone energy. 

As illustrated in our study, when the system is close to the fully folded state, the dihedral 
confinement is strong enough to complete the folding (the n = 1-block case). However, it 
is too weak to confine the backbone collectively if there is a larger unfolded portion in the 
/3-sheet, as illustrated in the pure Lennard- Jones potential case. Instead, it is the angular 
dependence which incorporates the contact strength into the backbone Hamiltonian and 
thus amplifies the confinement. This leads to a competition with solvation dynamics; in 
other words, while the solvent tries to pull the interacting blocks apart, the confining force 
amplified by folded neighbors can hold them back. As this behavior occurs mainly between 
contiguous un/folded regions, it creates an "interfacial" energetic penalty and the "droplet" 
picture proposed in previous worksiHl (now allowing a measurement the droplet surface 
tension). Consequently, in a partially folded structure, any small regions of unfolded blocks 
contiguous to folded ones are thermodynamically unfavorable and will be forced to fold 
or will unfold the folded ones0. Thus, while there has been a concentration of effort on 
understanding the role the sequence heterogeneity for protein folding and design, we would 
like to point out that this generic cooperativity buried in the /3-sheet architecture can also 
serve as a building block to construct macromolecular structure. However, this same building 
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block can also provides the mechanism for pathological aggregation. 

What then would be the physiological role and evolutionary principle of having sequence 
heterogeneity, in the presence of possible /^-architectures? Certainly, with the fast time scale 
(of order of 0.1 ns, estimated from our Langevin simulations) for a single C a block folding, 
any pre-folded /3-architecture with ordered, exposed amide and carbonyl oxygen could lead 
to a seeding process and non-specifically precipitate surrounding unfolded peptides; this 
has been observed in the elongation process of amyloidogenesiJi. The physiological role of 
heterogeneity, therefore, would be to either distort the orientation of exposed amide and 
carbonyl groups into a non-aggregation competent form, or provide "hot" nuclei to bias 
the system into the native fold. Indeed, this idea has been verified in a recent tryptophan 
zipper modefi Exploring the competition between native folding due to specific sequence 
heterogeneity and pathological aggregation due to generic cooperativity is perhaps the most 
important direction for future study. 

As a side point, to elucidate the effect of local un/binding, we have introduced a local 
melting method. This method can be further used to explore local nucleation or de-solvation 
events in any system lacking long-ranged spatial couplings. Also, instead of simulating 
all backbone atoms or a simplified C a model, a functional block (co-plane) approach was 
proposed. This approach greatly reduces our computational effort; however, it still keeps 
the right angular degrees of freedom (missing in C a model) and gives rise to a structural 
factor (i.e., backbone alignment) that helps account for /3-sheet cooperativity. 

Also, we have introduced an approach to de/solvation which can be used directly for 
Langevin simulations. It has been noticed that the (de)-solvation effects are important in 
protein folding and functioning, but no standard model has been proposedlil00SEiiiilil. 
Also, we note that solvent dynamics can be regulated by system hydrostatic pressure^!. 
Interestingly, many adhesion and hemostasis-related proteins are very sensitive to shear 
stress and pressure change; perhaps a local de/re-solvation event might trigger a cooperative 
conformational change which ultimately has large scale consequencesEl. Thus, microscopic 
cooperativity can be amplified into a macroscopic response; this also needs to be explored 
in future work. 
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VI. FIGURE CAPTIONS 

Fig.l (a) The co-planar structure with (b) the three virtual points {X 1 ,X 2 ,X 3 } and 
dihedral angles uj, </>, ip labeled. The virtual bond lengths and angles are determined so as 
to give (c) a regular, symmetric /3-sheet structure with side-chains and C Q H bonds alter- 
nately pointing up and down periodically, (d) The side-chain and C a -H- • -OC interactions 
are indicated by light-gray, gray arrows, respectively, with the shadowed area indicating the 
atoms involved determining the angular dependence of a single H-bond. (e) If two interact- 
ing blocks i,j are to have a native H-bond (the gray arrow), proper alignment of two pairs 
of planar vectors and collinearity of the two sets of points (X^X^X^; X^Xg^X^) are 
required. 

Fig. 2 The structural-factor-dependent H-bond potential profiles in units of kcal/mol 

o 

with interaction distance r (NH- • • OC) measured in A- (a) The (de)-solvation force field 
UHb( r , #Hb)- The barrier height between r = r^ h and the solvent-separated minimum (r = 
r^) represents the free energy cost in desolvation cavity formation. Here we have graphed 
UHb(^, a^Hb) from XHb = to 1. The dashed curve represents the modulation of the contact 
minimum (r£ b , Unb^Hb; ^Hb))- (b) The Lennard- Jones force field Ulj 9 (V), where the profile 
is modulated by changing the contact minimum (long-dashed curve) |ULj 9 (rLj(a;Hb))| — 
XHbl-^Hbl + (1 — x~tib)\E sol \ from the H-bond contact energy E^b to the single H 2 hydration 
E so1 . The major difference between (a) and (b) is the desolvation barrier. 

Fig. 3 (a) The specific heat for a 3-strand with 6 H-bonds modeled by different force 
fields (without Go-like restrictions). Here, because of the absence of Go-like pairing, the 
three force field LJ /ix , LJ a ™ 9 , and So\ fix are unable to define a unique structure and hence 
there is no obvious peak (i.e., transition) in their CV diagrams. Whereas force field Sol"" 9 
can define unique /9-sheet and hence leads to a sharper transition, (b) Same as (a) except 
for the Go-like restrictions to model L3^ tx , L3 ang , and SoV lx here. Note that the transitions 
for these three force fields become more apparent but still not compatible with Sol an9 . (c) 
The specific heat for a 4-strand with 15 H-bonds modeled by different force fields. Now the 
transition for Sol™" 9 become much stronger and there is almost no transition for L-J type 
force fields (inset). 

Fig. 4 The density of state for a 4-strand /5-sheet with 15 H-bonds. (a): The density 
of state for L-J type potential (with Go-like restrictions). Note the distribution of entropy 
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and the high-energy population appearing in model LJ ans ; this population contains partially 
folded state, (b): The density of state for Sol type potential. Note that a good separation 
between the unfolded and folded ensemble only appeared in model SoP™ 9 . 

Fig. 5 The density of state (a): n(E), specific heat (b) C v = ^p-, and energy (E) (inset 
in (b)) for system asymmetrically elongated at one direction on its /3-sheet plane (using 
model Sol an9 ). Here the system is a 3-strand /9-sheet with total 30 H-bonds equally dis- 
tributed between adjacent strands. Note the suppression of low-energy ensemble (between 
the completely unfolded and native states) and the appearance of high-energy population. 
The dominant configurations in these ensemble (insets) are "droplets" with interfaces be- 
tween contiguous un/folded regions. Note that the low-energy configuration has less interface 
compared with high-energy one. The negative energy of the unfolded state comes from the 
two tethered /3-turns. 

Fig. 6 (a): The n(E) for a 3-strand /9-sheet with total 40 H-bonds. Note the growth of 
partially folded ensemble (inset in (a) is a typical example of their configurations) at low- 
energy phase and smears out the separation between unfolded and the native states. This 
can lead to an abrupt transition (with the native state) at low temperature phase (pointed 
by double arrows in (b)) and a continuous transition (with other high-energy unfolded 
states) at high temperature phase (pointed by a single arrow in (b)). When the system size 
increases to totally 50 H-bonds (c), the smearing becomes more significant (arrow in (c)) 
and the system eventually loses its cooperativity. 

Fig. 7 The numerical results for the probability distribution function (pdf) P(r) of 
NH- • • OC separation r of the "molten" block respective to its native partner in the (3- 
tail single-block case (indicated in the inserted diagram; black: frozen, gray: mobile). Here 
temperature: 25°C and integration step 59 = 10~ 4 . Also, the curves have been adjusted 
up/downward for a better visual comparison. Note that the barrier for model Sol-^ is much 
larger than that of So\ an9 . This is because the desolvation barrier height is fixed in the 
former case, but modulated (by the angular factor) in the latter one. 

Fig. 8 The numerical results for buried two-block cases, (a) The pdf P(r) for the native 
H-bond separation (average of the two symmetric mobile blocks), (b) The normalized 
conditional pdf in first molten block, -P(ri), with respect to whether the second block is 
unfolded (defined as if r 2 > r^ b ) or folded (defined as if r 2 < r^ h ). Here r 1; r 2 are the 
H-bond separations of the two molten blocks with their native partner, respectively, and 
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j driP(ri\r 2 ^ r^ b ) = 1. (c) The normalized conditional pdf in first molten block, P(Ei), 
plotted on its H-bond energy E]_. 

Fig. 9 (a): Langevin simulation results of a ten-block buried case. Each curve represents 
the corresponding block in the inset diagram, averaged over the symmetrical pair. The 
simulation is averaged over 25 series of 1 ms runs (At = 5 x KT 5 ps). (b): Snapshot 
of re-solvation simulation at T = 850°K where each heavy atom is labeled by different 
symbols depending on the corresponding H-bond interacting distance r: "•" (r < rj^ b ), "o" 
(>Hb < r < rgb), and " " (r > 7g b ). (c): Snapshot of unfolding simulation of a 4-strand 
/3-sheet with 15 H-bonds at T = 800°K where we have tethered the H-bond at the turn area 
to maintain the /3-turn propensity. The time courses are plotted on the change of total H- 
bond (indicated by r < r^ b ) number Qnb and energy Enb (kcal/mol), and configurations at 
different snapshots (a,b,c,d,e) are projected to left-hand side. Note the successive infiltration 
of water, and the overshot of Enb at the abrupt transition point, manifesting a collective 
barrier crossing for the re-solvation process. 



APPENDIX A: MODIFIED DESOLVATION POTENTIAL 



The simplified double-well potential provides the basic insight on how de/solvation effect 
involves in protein cooperativity. However, one might wish to modify this model to include 
more control parameters. For this purpose, we propose an alternative potential 
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and r as the distance between NH and CO, fig.|l](a). Here, the interacting distance for 
formation of contact, desolvation barrier, and single- solvent separation are r = To, r — r\, 
r = r 2 , with free energy amplitude D , D\, D 2 , respectively, and the width of the contact 
well is controlled by the span between r = r x o and r = r x \. Normally, r 2 is separated from 
ro by the size of a single water molecule. The integer m > 2, on the other hand, is chosen 
for the long-distance behavior, and the six unknowns B,hi,h 2 ,A,b,s can be determined by 
D , Di, D 2 , r , r 2 and one continuity requirement (dUnB/dr at r = r xl ). Using this model, 



23 



one has a complete control on the potential profile (the well width for the contact entropy, 
e.g.). 

To incorporate the fully Lennard- Jones behavior into the profile at both short (r <C r^J 
and long (r ^> rg^) range region, we also proposed another force field 



Urn ,(/•)= U-) 



r -°) -2 



Br 



6m 



1 + Cr 12m 



with m as a positive integer and A, B, C > 0. Here the second term is designed to produce 
the desolvation granularity effect. 
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